Nonparametric multi-assignment clustering
نویسندگان
چکیده
Multi-label learning has attracted significant attention from machine learning and data mining over the last decade. Although many multi-label classification algorithms have been devised, few research studies focus on multi-assignment clustering (MAC), in which a data instance can be assigned to multiple clusters. The MAC problem is practical in many application domains, such as document clustering, customer segmentation and image clustering. Additionally, specifying the number of clusters is always a difficult but critical problem for a certain class of clustering algorithms. Hence, this work proposes a nonparametric multiassignment clustering algorithm called multi-assignment Chinese restaurant process (MACRP), which allows the model complexity to grow as more data instances are observed. The proposed algorithm determines the number of clusters from data, so it provides a practical model to process massive data sets. In the proposed algorithm, we devise a novel prior distribution based on the similarity graph to achieve the goal of multiassignment, and propose a Gibbs sampling algorithm to carry out posterior inference. The implementation in this work uses collapsed Gibbs sampling and compares with several methods. Additionally, previous evaluation metrics used by multi-label classification are inappropriate for MAC, since label information is unavailable. This work further devises an evaluation metric for MAC based on the characteristics of clustering and multiassignment problems. We conduct experiments on two real data sets, and the experimental results indicate that the proposed method is competitive and outperforms the alternatives on most data sets.
منابع مشابه
Interference-Aware and Cluster Based Multicast Routing in Multi-Radio Multi-Channel Wireless Mesh Networks
Multicast routing is one of the most important services in Multi Radio Multi Channel (MRMC) Wireless Mesh Networks (WMN). Multicast routing performance in WMNs could be improved by choosing the best routes and the routes that have minimum interference to reach multicast receivers. In this paper we want to address the multicast routing problem for a given channel assignment in WMNs. The channels...
متن کاملOn a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification
Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class class...
متن کاملA Nonparametric Multi-seed Data Clustering Technique
Clustering of data around one seed does not work well if the shape of the cluster is elongated or non-convex. A complex shaped cluster requires several seeds. This study developed a nonparametric multi-seed data clustering approach which splits and merges procedures to handle the complex shapes of clusters. The splitting process utilizes a genetic algorithm to search for the appropriate cluster...
متن کاملFully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model
We have proposed a novel speaker clustering method based on a hierarchically structured utterance-oriented Dirichlet process mixture model. In the proposed method, the number of speakers can be determined from the given data using a nonparametric Bayesian manner and intra-speaker variability is successfully handled by multi-scale mixture modeling. Experimental result showed that the proposed me...
متن کاملSmooth Image Segmentation by Nonparametric Bayesian Inference
A nonparametric Bayesian model for histogram clustering is proposed to automatically determine the number of segments when Markov Random Field constraints enforce smooth class assignments. The nonparametric nature of this model is implemented by a Dirichlet process prior to control the number of clusters. The resulting posterior can be sampled by a modification of a conjugate-case sampling algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 21 شماره
صفحات -
تاریخ انتشار 2017